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ABSTRACT 


The  problems  of  estimating  the  total  number  of  measurement  points 
and  the  optimum  spatial  distribution  of  locations  on  a  structure  are  ap¬ 
proached  theoretically  in  this  report.  The  significant  factors  to  be  con  - 
sidered  are  statistical  reliability  and  economy.  Therefore,  the  relation¬ 
ships  are  developed  with  the  emphasis  on  measurement  efficiency.  Ran¬ 
dom,  systematic,  and  stratified  sampling  methods  are  compared  for  ef¬ 
ficiency  in  estimating  mean  values.  Then  the  optimum  allocation  of  a 
fixed  number  of  measurement  points  in  stratified  sampling  is  developed, 
and  illustrative  examples  are  given.  Finally,  relationships  are  presented 
which  will  allow  the  total  sample  size  to  be  estimated  under  the  assump¬ 
tions  of  normal  and  log-normal  sampling  distributions  as  well  as  by  a 
nonparametric  approach.  These  formulas  are  deemed  to  be  quite  useful 
for  experiment  planning  purposes. 

This  abstract  is  subject  to  special  export  controls  and  each  trans¬ 
mittal  to  foreign  governments  or  foreign  nationals  may  be  made  only  with 
prior  approval  of  the  Air  Force  Flight  Dynamics  Laboratory  (FDTR), 
Wright- Patterson  Air  Force  Base,  Ohio  45433. 
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GLOSSARY  OF  SYMBOLS 


A 

A, 

d 

d' 

L 
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2 

s 

w 


area  of  the  structure 

area  of  the  structure  in  the  hth  zone 

allowable  measurement  error 

allowable  error  in  the  log  regime 

number  of  nonoverlapping  zones  (disjoint  strata) 

total  number  of  data  points  in  a  sample 

number  of  data  points  in  the  hth  zone 

maximum  sample  size  possible 

probability  that  a  sample  value  will  exceed  a  critical  value 

probability  that  a  sample  value  within  the  hth  zone  will 
exceed  a  critical  value 

sample  variance 

_  2 

estimator  for  log  u  =  log  X  +  — 

X  2 

weighting  function  associated  with  the  hth  zone 

random  variable  used  here  as  mean  square  stress 

ith  sample  value  of  X 

established  critical  value  for  X 

sample  mean  of  X 

derived  random  variable 

value  of  Y  derived  from  the  ith  sample  value  of  X 

random  variable  having  a  normal  distribution 

the  100  or/2  percentage  point  of  a  standardized  normal 
distribution 
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ar 
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statistical  level  of  significance 
true  standard  deviation 
true  variance 
true  mean 
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Var[  ] 
Prob  [  ] 
SE[  ] 

r> 

(~) 


expected  value  of 
variance  of 
probability  that 
standard  error  in 
estimated  value  of 
space  averaged  value  of 
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1.  INTRODUCTION 


There  are  basically  two  statistical  problems  associated  with  sampling 
the  response  of  aircraft  structure  to  flight  loads.  Since  these  loads  are 
typically  representative  of  a  time  varying  random  process,  the  first  item 
of  concern  is  the  time  averaged  statistical  properties  at  a  single  point  on 
the  structure.  Given  sufficient  sample  record  length,  these  properties  can 
be  estimated  with  reasonable  accuracy,  and  for  stationary  random  loading 
conditions,  necessary  record  lengths  are  not  difficult  to  obtain. 

The  second  basic  problem  concerns  point-to-point  variation  on  a 
structure.  Having  evaluated  conditions  at  a  single  point  as  a  function  of 
time,  it  is  of  interest  to  know  how  the  remainder  of  the  structure  is 
behaving.  It  is  this  facet  of  sampling  which  is  of  concern  in  this  report. 

The  relationships  between  the  number  of  data  points  in  a  structural  sample 
and  the  accuracy  in  estimating  two  statistical  properties  of  the  data  are 
discussed  thoroughly.  The  two  properties  investigated  are  the  mean  value 
and  the  probability  that  a  point  selected  at  random  on  the  structure  will 
exhibit  a  sample  value  which  exceeds  some  specified  level.  The  develop¬ 
ments  are  aimed  at  providing  tools  for  the  planning  of  statistical  loads 
experiments.  In  Section  2,  three  different  methods  of  sampling  are 
described  along  with  their  relative  efficiencies.  Formulas  are  given  in 
Section  3  for  the  optimum  allocation  of  points  in  a  sample  when  the  structure 
has  been  partitioned  into  zones  for  study.  Section  4  gives  methods  for 
determining  the  total  number  of  points  in  a  sample  which  will  correspond 
to  a  required  degree  of  accuracy  in  the  estimates.  This  is  done  for  the 
normal  distribution,  the  log  normal  distribution,  and  a  nonparametric 
approach  which  considers  the  distribution  to  be  unknown. 
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2.  SAMPLING  METHODS 


In  this  section,  three  methods  of  spatial  sampling  are  discussed  in 
terms  of  the  parameters  needed  to  assess  their  relative  efficiencies. 

The  example  random  variable  being  sampled  is  mean  square  stress,  although 
any  time  averaged  measure  of  load  response  could  be  substituted.  It  is 
assumed  in  all  cases  that  the  statistical  uncertainty  associated  with  finite 
sample  record  length  is  negligible  compared  with  spatial  variation.  In  the 
following,  n  denotes  the  number  of  data  points  in  a  sample  and  X.  the 
ith  independent  mean  square  stress  measurement. 

2.  1  RANDOM  SAMPLING 

If  mean  square  stress  is  measured  at  n  points  on  a  structure 
selected  at  random,  one  has  a  random  sample  of  data.  It  is  assumed  that 
every  point  on  the  structure  has  been  given  independent  and  equal  probability 
of  being  a  data  point.  The  mean  value  of  the  data  in  the  sample  thus  gathered 
is  given  by 


n 


(1) 


The  variance  of  the  sample  mean  is  defined  as 


Var  (X)  = 


N  -  1 
N 


(2) 


where  <r  is  the  variance  based  on  the  maximum  possible  sample  size,  N. 
Since  N  is  usually  considered  infinite  for  structures,  (N  -  1/N)  is  equal  to 
one.  For  a  sample  size,  n,  the  variance  of  the  sample  mean  can  be 


2 


estimated  by 


(3) 


A  - 

Var  (X)  = 


with  (A)  denoting  estimated  value.  The  term  s  ,  which  estimates  cr  , 

A 

is  the  unbiased  sample  variance  and  is  given  by 


n 

■x  '  £ ,(Xi  -  X)2  (4» 

To  estimate  the  probability,  P  (0  <  P  <  1),  that  the  mean  square 
stress  at  any  point  on  the  structure  selected  at  random  will  exceed  some 
critical  value  ,  proceed  as  follows.  Define  a  new  random  variable  Y 
which  has  these  properties.  For  any  measurement  in  the  sample, 
equals  one  if  X^  >  X^  and  is  zero  if  X^  <  X^  .  Since  the  probability 
that  X^  >  X^  is  P,  it  follows  that  the  expected  value  of  each  Y^  is 

El[  Y  ]  =  1  •  P  +  0(1  -P)  =  P  (5) 


Then,  if  the  random  sample  consists  of  n  measurements,  the  estimate 
of  P  is  given  by 


n 


(6) 


For  a  variable  such  as  Y  which  can  assume  only  the  values  one  and 
zero,  the  sample  variance  can  be  expressed  as 

<7> 
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The  variance  of  P,  using  the  relationship  employed  in  Eq.  (3)  is  then 
estimated  by 


Var 


/ar  (£)  = 


(8) 


2.2  SYSTEMATIC  SAMPLING 

The  second  method  of  sampling  requires  the  sample  points  to  be  laid 
out  on  a  structure  in  a  systematic  fashion.  For  example,  in  a  one -dimensional 
structure,  this  may  amount  to  taking  measurements  at  uniform  intervals 
along  the  single  dimension.  Systematic  sampling,  thus,  places  a  restriction 
on  sample  point  location,  which  was  not  the  case  for  random  sampling.  One 
unfavorable  aspect  of  this  technique  is  the  inability  to  correctly  evaluate 
data  when  certain  periodic  trends  exist.  This  problem  is  discussed  in 
Reference  1  along  with  examples  of  special  cases.  In  general,  however, 
this  sampling  method  should  be  avoided  when  periodic  trends  exist  in  the 
data.  On  the  other  hand,  when  a  linear  trend  in  the  data  exists,  systematic 
sampling  becomes  much  more  efficient  than  random  sampling.  Periodic 
trends  are  more  likely  to  exist  in  aircraft  structural  stress  than  are  linear 
trends  because  of  normal  mode  response. 

The  mean  value  of  a  systematic  sample  of  mean  square  stress  in  a 
structure  is  given  by  Eq.  (1),  and  the  variance  of  the  sample  mean  is  closely 
approximated  by 


N  -  1  2 

— Z7~  O' 


(9) 


Var  (X)  = 


when  the  variance  due  to  finite  time  sampling  is  very  small.  Since  the 
maximum  possible  sample  size,  N,  can  usually  be  considered  infinite  for 
structural  sampling,  (N  -  l/N)  is  equal  to  one,  and  the  variance  of  the 


4 


2 


sample  mean  is  simply  c r  .  That  is,  it  is  independent  of  the  sample  size 
n.  By  comparing  Eqs.  (2)  and  (9),  it  is  apparent  that  systematic  sampling 
is,  in  general,  less  efficient  than  random  sampling.  In  view  of  the  above 
shortcomings  of  the  method,  systematic  sampling  will  not  be  considered 
further  in  the  discussion. 

2.  3  STRATIFIED  SAMPLING 

Using  this  method,  a  structure  is  partitioned  into  L  non -over lapping 
zones  (disjoint  strata),  and  n  measurement  points  are  allocated  over  the 
structure  in  such  a  fashion  that  each  zone  contains  at  least  two  randomly 
located  points.  The  sample  mean  of  the  measurements  is  computed 


from 


(10) 


is  the  mean  value  for  that  zone.  Here  it  can  be  seen  that  the  interpretation 
of  the  weighting  function,  W  ,  can  be  extremely  important.  Although 
many  engineering  factors  may  be  involved  in  this  interpretation,  simple 
area  relationships  will  be  used  in  this  development.  Then,  will  be 

defined  as 


where  A  is  the  area  of  the  hth  zone,  and  A  is  the  total  area  of  the 
n 

structure  under  study.  The  variance  of  the  sample  mean  is  given  by 


5 


(12) 


L 

Var  (X)  =  Y, 
h-1 


"h 


where  n  is  the  number  of  points  in  the  hth  zone,  and  cr  is  the  true  zone 
h  h 

variance.  Variance  of  the  sample  mean  is  estimated  from 


L 

Var  (X)  =  Y 
h=l 


2 

Sh 


“h 


(13) 


where  s^  is  the  sample  variance  from  the  hth  zone. 

The  efficiency  of  stratified  sampling  in  estimating  mean  values  can 
be  compared  to  that  of  random  sampling  by  the  following  relationship 
using  Eqs.  (2)  and  (12). 


Relative  Efficiency  = 


Var  (X  by  random  sampling) 
Var  (X  by  stratified  sampling) 


c r  /  n 


(14) 


It  can  be  seen  from  Eq.  (14)  that  stratified  sampling  efficiency  increases 

2  .  2 

for  either  an  increase  in  a  or  a  decrease  in  o\  .  Therefore,  in  order 

h 

to  achieve  high  efficiency  using  this  method,  the  structure  should  be 
zoned  so  that  the  statistical  properties  of  the  zones  are  quite  different  from 
each  other  but  the  data  within  each  zone  are  very  similar.  In  practice,  the 
within  zone  variances  will  be  found  to  be  smaller  than  the  overall  structure 


6 


variance,  so  stratified  sampling  will  generally  be  superior  to  random 
sampling.  This  may  be  true  even  though  the  zoning  operation  is  performed 
after  an  initial  attempt  at  random  sampling.  Often  the  results  of  random 
sampling  will  provide  the  best  indicators  for  natural  zone  boundaries.  For 
example,  assume  that  mean  square  stress  has  been  measured  at  nine  points 
selected  at  random  on  an  aircraft  wing  as  illustrated  in  Figure  1. 
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Further,  assume  the  set  of  data  values  at  each  point  given  in  Table  1 
has  been  obtained  from  long  records  during  stationary  loading  conditions 
so  that  the  statistical  uncertainty  of  each  measurement  is  negligible. 


Data  Point 

/  V2 

Measured  ms  Stress  ,  (psi) 

1 

10.  2  x  107 

2 

1 1 . 5  x 

3 

5.  6  x 

4 

7.  1  x 

5 

5.  9  x 

6 

6.  3  x 

7 

8.  7  x 

8 

9.  5  x 

9 

9.  0  x 

Table  1 


Then,  if  it  is  required  to  estimate  the  mean  value  of  mean  square  stress 
in  the  wing  with  low  variance  in  the  estimate,  proceed  as  follows. 

Using  Eq.  (1)  to  compute  the  sample  mean  for  the  values  in 
Table  1  results  in 

9 

—  1  V  7  2 

X  =  —  2_,X-  =  8.  2  x;  10  (psi) 

9  i=l  1 
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The  sample  variance  is,  from  Eq.  (4), 


9 

?  i  V-1  7?  14  4 

s  =  4-  )  (X.  -  8.2  x  10  )  =  4.3x10  (psi) 

X  8  f-v  i 


and  the  estimated  variance  of  the  sample  mean  is  given  by  Eq.  (3)  for 
random  sampling. 


/\  -  4.  3  x 

Var  (X)  = - — 


10 


14 


=  .48  x 


14  4 

10  (psi) 


Now,  noting  the  relationship  between  the  stress  level  and  the  physical 
location  of  the  various  points  in  the  sample,  it  would  seem  likely  that  greater 
sampling  efficiency  (less  variance  in  the  estimate  of  the  mean)  could  be 
achieved  if  the  wing  were  partitioned  into  zones  such  as  shown  in 
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Examining  the  same  data  from  the  standpoint  of  stratified  sampling, 
where  the  weighting  factor  is  based  on  the  ratio  of  zone  area  to  wing 

area  alone,  the  parameters  can  be  summarized  as  in  Table  2. 


Table  2 


The  sample  mean  calculated  using  Eq.  (10)  is 


3 

—  V  —  7  2 

X  =  >  W.X,  =8.7x10  (psi) 

fei  h  ^ 

and  the  estimate  of  the  variance  is,  by  Eq.  (13) 

3  2  2 

A  V-1  "W,  s  u  4 

Var  (X)  =  2_,  -  =  0.066  x  10  (psi) 

h=l  ^ 
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Thus,  it  can  be  concluded  that,  although  the  estimate  of  the  mean  value  of 
mean  square  stress  in  the  wing  is  nearly  the  same  when  using  random  or 
stratified  sampling,  the  latter  method  gives  much  more  confidence  in  the 
result. 

The  probability  that  the  mean  square  stress  measured  at  a  point  in 

any  of  the  zones  will  equal  or  exceed  a  specified  critical  level  can  be 

estimated  using  a  technique  similar  to  that  applied  in  Section  2.  1  for  random 

sampling.  That  is,  a  random  variable  Y  assumes  the  value  equal  to 

one  if  X  >  X  and  zero  if  X.  <  X  .  Then  the  desired  probability  can 
i  —  c  ic 

be  estimated  from 


L 


A 


(15) 


h=l 


where 


“h 


(16) 


The  variance  is  then  estimated  by 


L 


(17) 


h=l 
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3.  OPTIMUM  ALLOCATION  IN  STRATIFIED  SAMPLING 


Assuming  that  stratified  sampling  has  been  chosen  as  the  tool  for 
estimating  statistical  properties  of  mean  square  stress  in  a  structure,  the 
optimum  allocation  of  a  fixed  sample  size  n  can  be  determined  from  the 
following.  For  the  case  where  the  mean  value  of  spatially  distributed  mean 
square  stress  measurements  is  to  be  estimated,  the  optimum  allocation  is 
obtained  by  minimizing  Eq.  (12)  with  respect  to  .  Letting  the  weighting 
function  depend  on  area  ratios  alone,  the  allocation  for  each  zone  is 
determined  from 


"h1 


Vh 


n- 


L 

I 

h=l 


£  Vh 


(18) 


As  an  example  of  the  use  of  Eq.  (18),  suppose  the  problem  is  to  obtain  the 
best  estimate  of  the  mean  value  of  mean  square  stress  in  the  wing  illus¬ 
trated  in  Figures  1  and  2  employing  a  total  of  nine  transducers  and  making 
use  of  the  preliminary  data  in  Tables  1  and  2.  After  zoning  the  wing 
structure  as  shown  in  Figure  2  and  computing  the  area  and  sample  variance 
for  each  zone,  optimum  allocation  is  determined  below  in  Table  3. 


Zone 

Ah 

Sh 

Vh 

Vh'EVh 

1 

44 

.92  x 1 07 

40.5  x  1 07 

.46 

4 

2 

49 

.  65x 

31. 8x 

.  35 

3 

3 

40 

.41  x 

16.4  x 

.  19 

2 

Table  3 
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As  would  be  expected,  optimum  allocation  requires  a  larger  proportion 
of  the  measurements  to  be  assigned  to  the  larger  and  more  variable  zones. 
In  the  special  case  when  the  variances  <r  are  equal  for  all  zones,  the 
optimum  allocation  of  n  reduces  to 


"h8" 


(19) 


That  is,  the  allocation  for  each  zone  should  be  proportional  to  structural 
area  alone  (or  other  significant  weighting  factor  considerations).  This 
special  case  is  called  "simple  stratification"  and  is  the  most  commonly 
used  method  when  the  zone  variances  are  unknown. 

When  sampling  is  conducted  for  the  purpose  of  determining  the 
probability  that  a  mean  square  stress  measured  at  any  point  exceeds  a 
critical  value,  the  optimum  allocation  of  sample  points  is  given  by 


Ah~Vph"-ph> 

L 

^-AhVPh(1  "  Ph) 

h=  1 


(20) 


The  parameter  is,  of  course,  unknown  and  must  be  either  assumed 

or  estimated  from  preliminary  data. 

If  the  object  of  the  investigation  is  to  make  comparisons  between 
different  zones,  the  rules  for  allocating  the  number  of  samples  to  each 
zone  are  slightly  different  from  those  which  applied  for  the  above 
developments.  For  example,  it  may  be  desired  to  compare  the  mean  of 
the  measurements  of  two  regions  of  a  structure,  two  different  structures,  or 
similar  structures  on  two  aircraft.  If  and  denote  the  means  of  the 

data  in  two  regions  of  interest, 
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2 

°2 


(21) 


Var  (X  -  X  )  =  —  +  — 

!  2  ni  n2 


If  Var  (X^  -  X^)  is  minimized  with  respect  to  n^  and  n^  ,  one  obtains 


nh=  n77T77 


,  h  =  1,2 


(22) 


Equation  (22)  indicates  that  the  number  of  samples  in  each  region  should 
be  allocated  proportionally  to  the  standard  deviation.  Note  that  the 
sample  allocation  given  by  Eq.  (22)  is  independent  of  the  size  of  the  area 
being  considered,  while  that  given  previously  by  Eq.  (18)  was  directly 
proportional  to  the  area  of  interest.  In  general,  if  there  are  L  regions 
of  comparative  interest,  the  optimum  allocation  among  them  would  be 
(see  Reference  2) 


*h 


(23) 
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4.  DETERMINATION  OF  SAMPLE  SIZE 


In  estimating  the  statistical  properties  of  a  random  process  from  a 
sample  consisting  of  a  finite  number  of  data  points,  the  accuracy  of  the 
estimates  increases  with  sample  size.  Although  a  high  degree  of  accuracy 
is  always  desirable,  economic  considerations  usually  impose  a  practical 
restriction  upon  the  maximum  number  of  points  in  a  sample.  Therefore, 
practical  sampling  procedures  involve  a  compromise  between  accuracy 
and  economy.  The  logical  first  step,  then,  is  to  define  the  amount  of 
error  that  can  be  tolerated  in  the  sample  estimates.  This  can  take  the 
form  of  a  statement  of  precision  specifying  the  minimum  probability  that 
the  difference  between  an  estimate  and  the  corresponding  true  value  does 
not  exceed  a  given  amount.  Then,  if  sufficient  information  about  the 
distribution  of  the  variable  under  investigation  is  available,  a  rational 
approach  to  the  determination  of  the  sample  size  for  a  given  experiment 
can  be  implemented.  In  this  section  the  sample  size  requirements 
associated  with  two  specific,  well-known  distributions  are  discussed  as 
well  as  a  nonparametric  approach  to  sample  size  determination. 

4.  1  SAMPLE  SIZE  UNDER  A  NORMALITY  ASSUMPTION 

Assume  that  the  distribution  of  sample  mean  square  stress 
measurements,  X,  in  a  structure  tends  to  normality  as  the  sample  size 
increases.  Although  this  assumption  suffers  under  the  practical  limitation 
that  mean  square  stress  can  never  be  negative,  this  should  not  materially 
decrease  the  value  of  the  following  developments. 

Consider  the  problem  of  estimating  the  mean  value,  X,  of  the  data 
within  plus  or  minus  d  units  of  the  true  mean  p.  The  appropriate 
statement  of  precision  for  a  probability  of  (1  -  a )  is  expressed  as 

Prob  (  |  X  -  p  |  <  d)  >  1  -  a  (24) 


15 


Now  let 


X  - 


jqt/2 


SE(X) 


(25) 


where  Z  is  the  100  a/ 2  percentage  point  of  the  standardized  normal 
a/2 

distribution,  and 


SE(X)=— ^  (26) 

is  the  standard  error  in  estimating  mean  values  obtained  from  Eq.  (2).  A 
value  for  <r  must  either  be  assumed  or  estimated  from  preliminary  measure¬ 
ments.  Combining  Eqs.  (25)  and  (26),  the  sample  size  required  to  satisfy 
Eq.  (24)  is 


n 


a2  Z 


2 

a/2 


(27) 


To  illustrate  the  approach,  consider  the  following  example.  Suppose  it 

is  required  to  estimate  the  mean  value  of  mean  square  stress  in  the  wing 

+  7  2 

shown  in  Figure  1  within  _  10  (psi)  with  a  minimum  probability  of  .  90 

under  a  normality  assumption.  That  is,  if  the  experiment  were  repeated 

7  7  2 

many  times,  the  sample  mean  would  fall  between  p  +  10  and  p-  10  (psi) 

7 

on  at  least  90%  of  the  trials.  In  this  case,  d=10  ,  l-a  =  .90,  Z  =  1.  65, 

2  ‘°5 
and  a  is  taken  as  equal  to  the  sample  variance  from  Section  2.  3,  or 

2  14  4 

( r  =4.3x10  (psi)  .  Then, 


n 


4.  3x  10  (1.  65) 


10 


14 


11.  7 


Therefore,  a  sample  size  of  12  would  be  appropriate  for  the  required 

+  7  2 

accuracy.  Although  the  tolerance  -  10  (psi)  might  seem  large,  in  this 
case  it  equals  only  about  t  12%  of  the  first  estimate  for  the  mean  computed 
in  the  example  of  Section  2.  3. 

The  problem  of  estimating  the  probability  that  the  stress  at  a  point 
selected  on  the  structure  at  random  will  have  a  mean  square  value,  X, 
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exceeding  an  established  critical  level  X^,  is  approached  in  a  similar 
manner.  If  it  is  required  that  the  estimated  probability  be  within  plus 
or  minus  d  units  of  the  true  probability,  the  statement  of  precision  is 


Prob 


A 

P 


P  <  d  >  1  -  or 


where  P  is  defined  as 


(28) 


-(x-p)2/2cr2 

e 


dx 


If  the  sample  size  is  large  (say  n  >  10),  the  function 


A 


SE(£) 


(29) 


(30) 


is  approximately  normal  with  zero  mean  and  unit  variance.  Since  the 
standard  error  in  estimating  P  is  [  see  Eq.  (8)]  , 


S  E  (P) 


P(1  -P) 

n  -  1 


(31) 


the  required  sample  size  in  this  case  is  given  by 


n 


Z«/2P(  1'P) 


+  1 


(32) 
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4.  2  SAMPLE  SIZE  UNDER  A  LOG  NORMAL  ASSUMPTION 

Since  the  mean  square  value  of  a  stress  can  never  be  negative,  the 
normality  assumption  could  clearly  not  hold  true  for  all  data.  Some  form 
of  skewed  distribution  might  then  provide  a  more  suitable  model  for  this 
or  similar  situations.  One  which  has  been  well  studied  and  which  will  be 
described  in  this  section  is  the  log  normal  distribution. 

The  random  variable  X  is  said  to  have  a  log  normal  distribution  if 
Y  =  log  X  is  normally  distributed.  That  is,  if  Y  is  normal,  then  X  =  e 
is  log  normal. 

The  density  function  of  X,  fx(x)  Is 


£x(x)  ■  dT  p(x  S  x)  =  £ 


Fy  (log  x)  =  -  fy  (log  X) 


2 

-  (log  x-p) 


for  x  >  0 


(33) 


=  0  for  x  <  0 

where  F  denotes  a  normal  distribution  function  with  mean  p  and 
^  2 

variance  a  .  The  mean  of  X,  Px>  in  terms  of  the  parameters  of  Y  is 
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X 


x  fx(x)  dx 


f 

s -CO 


e  fy(y)  dy 


(34) 


P  ,  2 

;  +  cr 


/  2 


The  variance  of  X,  cr  ,  is 

A 


2  2  2 
o-  =  E(X  )  -  p 

X  X 


^  n 


x  £X(x)  dx  '  ^ 


f 

y -CD 


2y  ,  .  .  ,  2p  +  cr 
e  fy(x)  dy  -  e 


(35) 


2u.  +  2  <r^  2u  +  cr 

e  -  e 


,  ,  2  /  2 
2p  +  cr  I  cr 


=  e 


It  is  important  to  note  that  the  transformation,  e  ,  transforms  the  mean 
as  well  as  the  variance. 


19 


2 

The  maximum  likelihood  estimates  of  p^  and  o\^  can  be  shown 
to  be 


A 

^x  =  e 


Y+s2/2 


(36) 


and 


A  2  2Y+s2  /  s' 

o-x  =  e 


-  1 


(37) 


where  Y  and  s  denote  the  sample  mean  and  variance  of  the  normal 
distribution  log  X. 

If  Eqs.  (34)  and  (35)  are  solved  for  p  and  <r^  , 


M-  =  log  Hx  -  —  log 


H-i 


1  +— j  =  log 


1 


1  +- 


X 


(38) 


and 


o-  =  log 


1  + 


X 


M- 


X 


(39) 


Let 


z  _  log  X  -  p 
0-  /  "Y  n 


(40) 
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where 


log  X  = 


n 

y  log  x. 

i=l  1 


n 


and  cr  is  estimated  by 


(41) 


£  (log  X.  -  log  X)Z 
2  =  i=l  _ 

n  -  1 


(42) 


The  sample  size  n  which  is  required  to  put 
minus  d'  units  of  p  with  a  probability  of  (1  -  a)  is. 


log  X  within  plus  or 
from  Eq.  (27), 


n 


2„2 
<r  Z  . 
a/ 2 


(d') 


(43) 


where  Z ^ ^  denotes  the  100  a/ 2  percentage  point  of  the  standardized 
normal  distribution.  That  is,  if  log  X  is  computed  from  n  samples, 
then 


Prob 


log  X  -  p  <  d']>  1  -  a 


(44) 


Now  we  will  develop  a  formula  to  determine  the  sample  size  to 

estimate  p^  by  X  with  a  maximum  error  of  d.  Note  that  log  X 

estimates  p,  but  not  p  .  An  estimator  for  p  is  derived  as  follows. 

A  X 

Let 


_  2 

w  =  log  X  +  ^ 


(45) 
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Since  it  is  clear  from  Eq.  (34)  that 


iog^x  =  ^  +  T 


(46) 


w 


w  estimates  log  p  ,  or  e  estimates  p  . 

A  A 

Let  n  be  the  sample  size  to  satisfy  the  following  relationship: 


Prob 


w  -  log  K-xj  £  d'j  >  1  - 


(47) 


substituting  Eqs.  (45)  and  (46)  into  (47), 


Prob  ^1  log  X  -  p|  <  d'j  >  1  -or 


In  other  words,  Eqs.  (44)  and  (47)  are  equivalent,  and  Eq.  (43) 
can  be  used  to  estimate  the  required  sample  size  to  satisfy  Eq.  (47). 
Equation  (47)  can  be  written  as 


Prob  I  w  -  d'  <  log  <w  +  d'  ]  >  1  -  a 


(48) 


or 


I  w-d'  w+d'  . 

Prob  I  e  <  p_x  <  e  |  >  1  -  a 


Now,  the  objective  is  to  determine  the  sample  size  n  such  that 
X  will  satisfy  the  following. 


Prob  [X-d<px<X  +  dl 


(49) 
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By  comparison  with  Eq.  (48), 


X-  d  = 


w-d' 

e 


(50) 


or 


X  +  d 


w+d' 

e 


(51) 


Denoting  solutions  for  d1  in  Eqs.  (50)  and  (51)  by  d1  and  d‘  , 

1  2 

respectively,  it  follows  that 

d'L  =  w  -  log  (X  -  d)  (52) 

d'2  =  log  (X  +  d)  -  w  (53) 

Let  n^  and  n^  be  the  sample  sizes  obtained  by  substituting  d^  and 
d^  into  Eq.  (43).  If  n  is  defined  by 


n  =  max  (n  ,  n  ) 

X  £ 


(54) 


then  n  is  the  sample  size  sufficient  to  assure  Eq.  (49).  Summarizing, 
the  sample  size  n  which  is  required  to  put  X  within  +  d  units  of  p 

X 

with  a  probability  of  1  -  a  is  given  by 
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r 


(55) 


n 


=  max 


2 


2 


log  X  +^-  -  log  (X  -  d) 


log  (X  +  d)  -  log  X  --?j 


For  example,  suppose  the  illustrative  example  of  Section  4.1  is  re¬ 
worked  using  a  log  normal  assumption  for  the  sampling  distribution.  The 
parameters  required  for  a  solution  by  Eq.  (55)  include  X  and  log  X 

which  can  be  computed  from  the  preliminary  data  given  in  Section  2.3 

,  2  2 
and  o'  which  can  be  estimated  by  s  . 

Assume  that  is  to  be  estimated  with  an  error  d  of  less  than 


7.  ..2 


10  (psi)  with  a  probability  1  -  a  of  .  90.  Then,  from  the  data  in 
Table  1,  and  from  Eqs.  (41)  and  (42), 


9 


9 


.  0697 


and 


log  (X  +  d)  =  log  (8.  2  x  107  +  107) 


18.3373  ;  18.0922 
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Then, 


n  =  max 


1 1 8 .  1984  + 


.  0697  (1-65) 


2 


,  . 0697 

18.  3373  -  18.  1984  - — - - 


.  0697  (1.65)Z 


max 


{o.  5;  17.5} 


Therefore,  the  sample  size  required  to  assure  the  stated  accuracy  under 
a  log  normal  assumption  would  be  18.  Note  that  since  the  assumed  dis¬ 
tribution  is  skewed,  the  estimation  error  is  not  symmetrical  about  the 
mean. 

Sample  size  requirements  associated  with  estimating  the  probability 
of  exceeding  a  critical  level  are  determined  as  they  were  for  the  normal 
distribution.  That  is,  Eq.  (32)  applies  for  the  log  normal  case.  How¬ 
ever,  instead  of  defining  P  as  in  Eq.  (29),  P  in  this  case  is  the  integral 
of  Eq.  (33). 


00 


(56) 


c 
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4.  3  A  NONPARAMETRIC  METHOD  FOR  SAMPLE  SIZE  DETERMINATION 


For  the  case  when  it  is  judged  undesirable  to  assume  a  specific  dis¬ 
tribution  of  data  values,  the  Tchebycheff  inequality  can  be  applied  to  deter¬ 
mine  a  sample  size  nonparametrically.  The  Tchebycheff  inequality  states 
that  for  every  k, 


Prob 


(57) 


The  significance  of  this  relationship  is  that  the  area  under  a  probability 

2 

density  curve  located  outside  of  p  +  kcr  will  not  exceed  1/k  regardless 
of  the  distribution.  Using  the  notation  of  Eq.  (24), 


a 


or 


d  =  k 


(58) 


Thus,  a  conservative  estimate  of  the  number  of  points  in  a  sample 
required  to  place  X  within  _+d  units  of  the  true  mean  p  with  probability 
(1  -  a)  for  any  distribution  of  X  is 


Similarly,  the  sample  size  required  to  estimate  P  within  a  given 
tolerance  for  a  specified  probability  is  given  by 


n 


P(1  -P) 


ad 


(60) 


It  can  easily  be  shown  that  for  any  P  in  the  interval  zero  to  one, 
P(1  -P)  — 1/4.  Therefore,  the  upper  bound  on  Eq.  (60)  is 


n= - (61) 

4a  d 

To  demonstrate  the  fact  that  this  method  leads  to  conservative  sample 
size  requirements,  consider  the  application  of  Eq.  (59)  to  the  previous 

example  problem.  Using  the  preliminary  data  in  Table  1  to  compute  an 

2  7  2 
estimate  for  o-  ,  and  specifying  an  allowable  error  of  +  10  (psi)  with 

probability  1  -  a  =  .  90,  the  total  number  of  measurement  points  would  be 


14 

4,  3  x  10 

.  10  (io7)2 
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Clearly,  this  nonparametric  method  is  quite  inefficient  when  additional 
information  about  the  sampling  distribution  exists.  However,  it  does 
represent  a  bound  on  the  sample  size  and  has  engineering  applications. 
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APPENDIX 


SUMMARY  OF  IMPORTANT  RELATIONSHIPS 
1.  RANDOM  SAMPLING 


a.  Mean 


b.  Variance 


1 

n  -  1 


Z(X  -  X)2 
i=l  1 


c.  Probability  of  Exceeding  Critical  Level  X 


n 


£  =  -  E  Y.  Y.  = 

n  V-*  l  i 

1=1 


1  if  X.  >  X 
i  —  c 


0  if  X.  <  X 
1  c 


2.  STRATIFIED  SAMPLING 


a.  Mean 


L 

h=l 


"A 


t>-  Probability  of  Exceeding  Critical  Level  X 


A 

P  = 


h=l 


wa 


Eq.  (1) 
page  2 


Eq.  (4) 
page  3 


Eq.  (6) 
page  3 


Eq.  (10) 
page  5 


Eq.  (15) 
page  11 
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3. 


OPTIMUM  ALLOCATION  IN  STRATIFIED  SAMPLING 


a.  Mean  Value  Estimation 


"h  L 

£  Vi 


Eq.  (18) 
page  12 


nh=  ~ 


(simple  stratification) 


Eq.  (19) 
page  13 


b.  Probability  Estimation 


nh  =  n 


L 

£  AhV^v 

h=l 


Eq.  (20) 
page  13 


c.  Comparison  of  Mean  Values  in  L  Zones 


n 

h 


Eq.  (23) 
page  14 
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4. 


SAMPLE  SIZE  UNDER  A  NORMALITY  ASSUMPTION 


a.  Mean  Value  Estimation 


n  = 


2„2 

1  Z*/2 


b.  Probability  Estimation 


Eq.  (27) 
page  16 


n  = 


L/2  P<*-P> 


+  1 


Eq.  (32) 
page  17 


5.  SAMPLE  SIZE  UNDER  A  LOG  NORMAL  ASSUMPTION 
a.  Mean  Value  Estimation 


n  =  max 


<r 


2  2 
0-  Z  . 
or/  2 


2  2 
tr  Z  . 
a/  2 


2 

2  ’ 

2  ] 

log  X  +  y-  -  log  (X  -  d) 

log  (X  +  d)  -  log  X  -  ~ 

~z)  Eci-(55) 

page  24 


b.  Probability  Estimation 


n 


Z»/2P<1-P'  ,  , 
'  d2 


Eq.  (32) 
page  17 
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6. 


SAMPLE  SIZE  NONPARAMETRICALLY 


a.  Mean  Value  Estimation 


Eq.  (59) 
page  26 


b.  Probability  Estimation 


n 


P(1  -  P) 

«d2 


Eq.  (60) 
page  27 


_L _ 

4adZ 


Eq.  (61) 
page  27 
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